Search CORE

3 research outputs found

Morfessor and Hutmegs : Unsupervised Morpheme Segmentation for Highly-Inflecting and Compounding Languages

Author: Creutz Mathias
Lagus Krista
Linden Krister
Virpioja Sami Petteri
Publication venue: Institute of Cybernetics; Institute of the Estonian Language
Publication date: 01/01/2005
Field of study

Peer reviewe

CiteSeerX

Helsingin yliopiston digitaalinen arkisto

Web augmentation of language models for continuous speech recognition of SMS text messages

Author: Creutz Mathias Johan Philip
Kovaleva Anna
Virpioja Sami Petteri
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2009
Field of study

In this paper, we present an efficient query selection algorithm for the retrieval of web text data to augment a statistical language model (LM). The number of retrieved relevant documents is optimized with respect to the number of queries submitted. The querying scheme is applied in the domain of SMS text messages. Continuous speech recognition experiments are conducted on three languages: English, Spanish, and French. The web data is utilized for augmenting in-domain LMs in general and for adapting the LMs to a user-specific vocabulary. Word error rate reductions of up to 6.6 % (in LM augmentation) and 26.0 % (in LM adaptation) are obtained in setups, where the size of the web mixture LM is limited to the size of the baseline in-domain LM.Peer reviewe

CiteSeerX

Crossref

Helsingin yliopiston digitaalinen arkisto

Low-Resource Active Learning of Morphological Segmentation

Author: Grönroos Stig-Arne
Hiovain Katri
Jokinen Päivi Kristiina
Kurimo Mikko
Rauhala Ilona Erika
Smit Peter
Virpioja Sami Petteri
Publication venue
Publication date: 01/01/2016
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto